Reading: <a href ="http://www.cs.cmu.edu/~tom/mlbook/NBayesLogReg.pdf"> Generative and Disciminative Classifiers </a> by Tom Mitchell.
Naïve Bayes assumes conditional independence of the $X_i$'s: $P(X_1,...,X_d|Y) =\prod_i P(X_i|,Y)$
(more on this assumption soon!)
Using Bayes rule with that assumption:
$$ P(Y=y_k|X_1,...,X_d) = \frac{P(Y=y_k) \prod_i P(X_i|Y=y_k) }{P(X)} $$Train the algorithm (estimate $P(X_i|Y=y_k)$ and $P(Y=y_k)$)
To classify, pick the most probable $Y^{\text{new}}$ for a new sample $X^{\text{new}} = (X_1^{\text{new}},X_2^{\text{new}},...,X_d^{\text{new}})$ as:
Training:
But... how do we estimate these parameters?
$P(X|Y=y_k)$ has parameters $\theta_{ijk}$, one for each value $x_{ij}$ of each $X_i$.
$P(Y)$ has parameters $\pi$.
To follow the MLE principle, we pick the parameters $\pi$ and $\theta$ that maximizes the (conditional) likelihood of the data given the parameters.
To estimate:
$P(X|Y=y_k)$ has parameters $\theta_{ijk}$, one for each value $x_{ij}$ of each $X_i$.
To follow the MLE principle, we pick the parameters $\theta$ that maximizes the conditional likelihood of the data given the parameters.
To follow the MAP principle, pick the parameters $\theta$ with maximum posterior probability given the conditional likelihood of the data and the prior on $\theta$.
To estimate:
Example prior for $\pi_k$ where $K$ > 2:
if $K=2$ this becomes a Beta prior.
Example prior for $\theta_{ijk}$ where J>2 :
These priors will act as imaginary examples that smooth the estimated distributions and prevent zero values.
To estimate:
How shall we represent text documents for Naïve Bayes?
What are the limitations with this representation?
What are the limitations with this representation?
$X = (X_1, X_2, … X_d)$ = document
$X_i$ is a random variable describing the word at position i in the document
How many parameters do we need to estimate $P(X|Y)$? (say 1000 words per document, 10000 words in english)
$X = (X_1, X_2, … X_d)$ = document
$X_i$ is a random variable describing the word at position i in the document
How many parameters do we need to estimate $P(X|Y)$? (say 1000 words per document, 10000 words in english)
Conditional Independence Assumption very useful:
Additional assumption: position doesn't matter!
(this is not true of language, but can be a useful assumption for building a classifier)
assume the Xi are IID: $$ P(X_i|Y) = P(X_j|Y) (\forall i,j)$$
we call this "Bag of Words"
Art installation in Gates building (now removed)
Additional assumption: position doesn't matter!
(this is not true of language, but can be a useful assumption for building a classifier)
assume the Xi are IID: $$ P(X_i|Y) = P(X_j|Y) (\forall i,j)$$
we call this "Bag of Words"
Since all $X_i$s have the same distribution, we only have to estimate one parameter per word, per class.
$P(X|Y=y_k)$ is a multinomial distribution:
$$ P(X|Y=y_k) \propto \theta_{1k}^{\alpha_{1k}} \theta_{2k}^{\alpha_{2k}}... \theta_{dk}^{\alpha_{dk}}$$Probability of observing a document with $\alpha_1$ count of $w_1$, $\alpha_{2}$ count of $w_2$ ... is a multinomial:
$$\frac{|D|!}{\alpha_1!\cdots \alpha_J!} \theta_1^{\alpha_1} \theta_2^{\alpha_2} \theta_3^{\alpha_3} \cdots \theta_J^{\alpha_J}$$Dirichlet Prior examples:
(Dirichlet is the conjugate prior for a multinomial likelihood function)
\begin{eqnarray} \theta_{jk} = \frac{\alpha_{jk} + \beta_{jk}-1}{\sum_m (\alpha_{mk} + \beta_{mk}-1)} \end{eqnarray}Again the prior acts like halucinated examples
What $\beta$s should we choose?
For code and data, see www.cs.cmu.edu/~tom/mlbook.html click on “Software and Data”.
Can group labels into groups that share priors:
comp.graphics, comp.os.ms-windows.misc, comp.sys.ibm.pc.hardware, comp.sys.max.hardware, comp.windows.x
misc.forsale
rec.autos, rec.motorcycles, rec.sport.baseball, rec.sport.hockey
alt.atheism,
soc.religion.christian,
talk.religion.misc, talk.politics.mideast, talk.politics.misc, talk.politics.guns,
sci.space, sci.crypt, sci.electronics, sci.med
• Naïve Bayes: 89% classification accuracy
Even when taking half of the email
Even when taking half of the email
More recently, algorithms such as LSTMs and Transformers have become very good
What can we do?
E.g. image classification, where $X_i$ is real valued
Classify a person’s cognitive state, based on brain image
60 distinct exemplars, presented 6 times each
Mitchell et al. Science 2008, data available online.
$Y$ is the mental state (reading “house” or “bottle”)
$X_i$ are the voxel activities (voxel = volume pixel).
Naïve Bayes requires $P(X_i | Y=y_k)$ but $X_i$ is continuous:
$$ P(Y=y_k|X_1,...,X_d) = \frac{P(Y=y_k) \prod_i P(X_i|Y=y_k) }{\sum_\ell (Y=y_\ell) \prod_i P(X_i|Y=y_\ell)} $$What can we do?
Naïve Bayes requires $P(X_i | Y=y_k)$ but $X_i$ is continuous:
$$ P(Y=y_k|X_1,...,X_d) = \frac{P(Y=y_k) \prod_i P(X_i|Y=y_k) }{\sum_\ell (Y=y_\ell) \prod_i P(X_i|Y=y_\ell)} $$What can we do?
Common approach: assume $P(X_i | Y=y_k)$ follows a Normal (Gaussian) distribution
$$p(X_i = x|Y=y_k) = \frac{1}{\sqrt{2\pi\sigma^2_{ik}}} \text{exp}\big({-\frac{1}{2}\frac{(x_i-\mu_{ik})^2}{\sigma_{ik}^2}}\big)$$Sometimes assume standard deviation
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
from scipy.stats import norm
x1 = np.linspace(-10,10,1000)
x2 = np.linspace(-10,10,1000)
# Assume I know the true parameters, this is not the case usually!
mu_1_1 = -5; sigma_1_1 = 2
mu_2_1 = 5; sigma_2_1 = 2
mu_1_0 = 5; sigma_1_0 = 2
mu_2_0 = -5; sigma_2_0 = 2
# Sample data from these distributions
X_positive = norm.rvs(loc=[mu_1_1,mu_2_1], scale=[sigma_1_1, sigma_2_1], size = (100,2))
X_negative = norm.rvs(loc=[mu_1_0,mu_2_0], scale=[sigma_1_0, sigma_2_0], size = (100,2))
plt.figure(figsize=(8,8))
plt.scatter(X_positive[:, 0], X_positive[:, 1],facecolors='r', edgecolors='w')
plt.scatter(X_negative[:, 0], X_negative[:, 1],facecolors='b', edgecolors='w')
plt.axis([-10,10,-10,10],'equal')
[-10, 10, -10, 10]
P_X1_1 = norm.pdf(x1,mu_1_1,sigma_1_1)
P_X2_1 = norm.pdf(x1,mu_2_1,sigma_2_1)
P_X1_0 = norm.pdf(x1,mu_1_0,sigma_1_0)
P_X2_0 = norm.pdf(x1,mu_2_0,sigma_2_0)
plt.figure(figsize=(8,7))
plt.scatter(X_positive[:, 0], X_positive[:, 1],facecolors='r', edgecolors='w')
plt.scatter(X_negative[:, 0], X_negative[:, 1],facecolors='b', edgecolors='w')
lim_plot = 10
plt.plot(x1,P_X1_1*2*lim_plot-lim_plot,'r',linewidth=4)
plt.text(-7, -12, r'$P(X_1|Y=1)$', color = 'red',fontsize=24)
plt.plot(x1,P_X1_0*2*lim_plot-lim_plot,'b',linewidth=4)
plt.text(3, -12, r'$P(X_1|Y=0)$', color = 'blue',fontsize=24)
plt.figure(figsize=(8,8))
plt.scatter(X_positive[:, 0], X_positive[:, 1],facecolors='r', edgecolors='w')
plt.scatter(X_negative[:, 0], X_negative[:, 1],facecolors='b', edgecolors='w')
plt.axis([-lim_plot,lim_plot,-lim_plot,lim_plot],'equal');
plt.axis([-10,10,-10,10],'equal')
[-10, 10, -10, 10]
plt.figure(figsize=(8,7))
plt.scatter(X_positive[:, 0], X_positive[:, 1],facecolors='r', edgecolors='w')
plt.scatter(X_negative[:, 0], X_negative[:, 1],facecolors='b', edgecolors='w')
lim_plot = 10
plt.plot(x1,P_X1_1*2*lim_plot-lim_plot,'r',linewidth=4)
plt.text(-7, -12, r'$P(X_1|Y=1)$', color = 'red',fontsize=20)
plt.plot(x1,P_X1_0*2*lim_plot-lim_plot,'b',linewidth=4)
plt.text(3, -12, r'$P(X_1|Y=0)$', color = 'blue',fontsize=20)
plt.plot(P_X2_1*2*lim_plot-lim_plot,x1,'r',linewidth=4)
plt.text(-16,5, r'$P(X_2|Y=1)$', color = 'red',fontsize=20)
plt.plot(P_X2_0*2*lim_plot-lim_plot,x1,'b',linewidth=4)
plt.text(-16,-5, r'$P(X_2|Y=0)$', color = 'blue',fontsize=20)
plt.axis([-lim_plot,lim_plot,-lim_plot,lim_plot],'equal')
[-10, 10, -10, 10]
# Compute log( P(Y=1|X)/ P(Y =0|X))
# as log( P(Y=1)P(X1|Y=1)P(X2|Y=1) / P(Y =0|X))P(X1|Y=0)P(X2|Y=0) )
# Using the real parameters. Usually, we have to estimate these!
X1,X2 = np.meshgrid(x1, x2)
def ratio_log(X1,X2):
pY0 =0.5; pY1 = 1- pY0
pY1pXY1 = pY1*norm.pdf(X1,mu_1_1,sigma_1_1)*norm.pdf(X2,mu_2_1,sigma_2_1)
pY0pXY0 = pY0*norm.pdf(X1,mu_1_0,sigma_1_0)*norm.pdf(X2,mu_2_0,sigma_2_0)
return np.log(pY1pXY1/pY0pXY0)
fX = ratio_log(X1,X2)
plt.figure(figsize=(10,8))
# plot contour plot
cs = plt.contourf(X1, X2, fX,20,cmap='RdBu_r',alpha=0.8);
plt.colorbar()
contours = plt.contour(cs, colors='k',alpha=0.4)
plt.contour(contours,levels=[0],linewidth=5)
# previous stuff
plt.scatter(X_positive[:, 0], X_positive[:, 1],facecolors='r', edgecolors='w',s=60)
plt.scatter(X_negative[:, 0], X_negative[:, 1],facecolors='b', edgecolors='w',s=60)
lim_plot = 10
plt.plot(x1,P_X1_1*2*lim_plot-lim_plot,'r',linewidth=4)
plt.text(-7, -12, r'$P(X_1|Y=1)$', color = 'red',fontsize=20)
plt.plot(x1,P_X1_0*2*lim_plot-lim_plot,'b',linewidth=4)
plt.text(3, -12, r'$P(X_1|Y=0)$', color = 'blue',fontsize=20)
plt.plot(P_X2_1*2*lim_plot-lim_plot,x1,'r',linewidth=4)
plt.text(-16,5, r'$P(X_2|Y=1)$', color = 'red',fontsize=20)
plt.plot(P_X2_0*2*lim_plot-lim_plot,x1,'b',linewidth=4)
plt.text(-16,-5, r'$P(X_2|Y=0)$', color = 'blue',fontsize=20)
plt.axis([-lim_plot,lim_plot,-lim_plot,lim_plot],'equal')
[-10, 10, -10, 10]
def ratio_log_compute(X1,X2,params):
pY0 =0.5; pY1 = 1- pY0
pY1pXY1 = pY1*norm.pdf(X1,params['mu_1_1'],params['sigma_1_1'])*norm.pdf(X2,params['mu_2_1'],params['sigma_2_1'])
pY0pXY0 = pY0*norm.pdf(X1,params['mu_1_0'],params['sigma_1_0'])*norm.pdf(X2,params['mu_2_0'],params['sigma_2_0'])
return np.log(pY1pXY1/pY0pXY0)
def plot_GNB(X_positive,X_negative,params):
pY0 =0.5; pY1 = 1- pY0
P_X1_1 = norm.pdf(x1,params['mu_1_1'],params['sigma_1_1'])
P_X2_1 = norm.pdf(x1,params['mu_2_1'],params['sigma_2_1'])
P_X1_0 = norm.pdf(x1,params['mu_1_0'],params['sigma_1_0'])
P_X2_0 = norm.pdf(x1,params['mu_2_0'],params['sigma_2_0'])
X1,X2 = np.meshgrid(x1, x2)
# faster way to compute the log ratio, or can use
# fX = ratio_log_compute(X1,X2,params)
fX = np.log(pY1/pY0) + np.log(P_X1_1.reshape([1000,1]).dot(P_X2_1.reshape([1,1000]))/
P_X1_0.reshape([1000,1]).dot(P_X2_0.reshape([1,1000])))
plt.figure(figsize=(10,8))
# plot contour plot
cs = plt.contourf(X1, X2, fX,20,cmap='RdBu_r',alpha=0.8);
plt.colorbar()
contours = plt.contour(cs, colors='k',alpha=0.4)
plt.contour(contours,levels=[0],linewidth=5)
# previous stuff
plt.scatter(X_positive[:, 0], X_positive[:, 1],facecolors='r', edgecolors='w',s=60)
plt.scatter(X_negative[:, 0], X_negative[:, 1],facecolors='b', edgecolors='w',s=60)
lim_plot = 10
plt.plot(x1,P_X1_1*2*lim_plot-lim_plot,'r',linewidth=4)
plt.text(-7, -12, r'$P(X_1|Y=1)$', color = 'red',fontsize=20)
plt.plot(x1,P_X1_0*2*lim_plot-lim_plot,'b',linewidth=4)
plt.text(3, -12, r'$P(X_1|Y=0)$', color = 'blue',fontsize=20)
plt.plot(P_X2_1*2*lim_plot-lim_plot,x1,'r',linewidth=4)
plt.text(-16,5, r'$P(X_2|Y=1)$', color = 'red',fontsize=20)
plt.plot(P_X2_0*2*lim_plot-lim_plot,x1,'b',linewidth=4)
plt.text(-16,-5, r'$P(X_2|Y=0)$', color = 'blue',fontsize=20)
plt.axis([-lim_plot,lim_plot,-lim_plot,lim_plot],'equal')
What if:
1st: case where save variance
from scipy.stats import multivariate_normal
# Same param as before
mu_1_1 = -5; sigma_1_1 = 2
mu_2_1 = 5; sigma_2_1 = 2
mu_1_0 = 5; sigma_1_0 = 2
mu_2_0 = -5; sigma_2_0 = 2
cov_positive = np.array([[sigma_1_1**2,3], [3,sigma_2_1**2]] )
cov_negative = np.array([[sigma_1_0**2,3], [3,sigma_2_0**2]] )
print(cov_positive)
# Sample data from these distributions
X_positive = multivariate_normal.rvs(mean=[mu_1_1,mu_2_1], cov=cov_positive, size = (100))
X_negative = multivariate_normal.rvs(mean=[mu_1_0,mu_2_0], cov=cov_negative, size = (100))
plt.figure(figsize=(8,8))
plt.scatter(X_positive[:, 0], X_positive[:, 1],facecolors='r', edgecolors='w')
plt.scatter(X_negative[:, 0], X_negative[:, 1],facecolors='b', edgecolors='w')
plt.axis([-10,10,-10,10],'equal')
[[4 3] [3 4]]
[-10, 10, -10, 10]
# Assume I perfectly estimate the parameters (not true for limited data!)
params = dict(mu_1_1 = -5, sigma_1_1 = 2,
mu_2_1 = 5, sigma_2_1 = 2,
mu_1_0 = 5, sigma_1_0 = 2,
mu_2_0 = -5, sigma_2_0 = 2
)
plot_GNB(X_positive,X_negative,params)
# Estimate
mu_1_1, mu_2_1 = np.mean(X_positive,axis=0)
mu_1_0, mu_2_0 = np.mean(X_negative,axis=0)
# Same Variance!
sigma_1_1, sigma_2_1 = np.std(X_positive,axis=0)
sigma_1_0, sigma_2_0 = np.std(X_negative,axis=0)
print(sigma_1_1, sigma_2_1)
print(sigma_1_0, sigma_2_0)
1.7522444858114437 1.966082854951404 1.8771045185177992 1.8285375175548788
Decision rule:
\begin{eqnarray} \ln \frac{P(Y=1|X_1...X_d)}{P(Y=0|X_1...X_d)} = \ln \frac{P(Y=1)}{P(Y=0)} + \sum_i\ln \frac{P(X_i|Y=1)}{P(X_i|Y=0)} ~ ~ ~ ~ ~ > \text{ or } <0? \end{eqnarray}If $X_i$s are $\mathcal{N}(\mu_{ik},\sigma_{ik})$:
$$p(X_i = x|Y=y_k) = \frac{1}{\sqrt{2\pi\sigma^2_{ik}}} \text{exp}\big({-\frac{1}{2}\frac{(x_{i}-\mu_{ik})^2}{\sigma_{ik}^2}}\big)$$Decision rule:
\begin{eqnarray} \ln \frac{P(Y=1|X_1...X_d)}{P(Y=0|X_1...X_d)} = \ln \frac{P(Y=1)}{P(Y=0)} + \sum_i\ln \frac{P(X_i|Y=1)}{P(X_i|Y=0)} ~ ~ ~ ~ ~ > \text{ or } <0? \end{eqnarray}If $X_i$s are $\mathcal{N}(\mu_{ik},\sigma_{ik})$:
$$p(X_i = x|Y=y_k) = \frac{1}{\sqrt{2\pi\sigma^2_{ik}}} \text{exp}\big({-\frac{1}{2}\frac{(x_{i}-\mu_{ik})^2}{\sigma_{ik}^2}}\big)$$\begin{eqnarray} \ln \frac{P(Y=1|X_1...X_d)}{P(Y=0|X_1...X_d)} &=& \ln \frac{P(Y=1)}{P(Y=0)} + \sum_i\ln \frac{P(X_i|Y=1)}{P(X_i|Y=0)} \\ &=& \ln \frac{P(Y=1)}{P(Y=0)} + \sum_i \ln \frac{\frac{1}{\sigma_{i1}}}{\frac{1}{\sigma_{i0}}} \text{exp} \big( -\frac{1}{2} \frac{(x_{i}-\mu_{i1})^2}{\sigma_{i1}^2} + \frac{1}{2} \frac{(x_{i}-\mu_{i0})^2}{\sigma_{i0}^2} \big )\\ \end{eqnarray}Decision rule:
\begin{eqnarray} \ln \frac{P(Y=1|X_1...X_d)}{P(Y=0|X_1...X_d)} = \ln \frac{P(Y=1)}{P(Y=0)} + \sum_i\ln \frac{P(X_i|Y=1)}{P(X_i|Y=0)} ~ ~ ~ ~ ~ > \text{ or } <0? \end{eqnarray}If $X_i$s are $\mathcal{N}(\mu_{ik},\sigma_{ik})$:
$$p(X_i = x|Y=y_k) = \frac{1}{\sqrt{2\pi\sigma^2_{ik}}} \text{exp}\big({-\frac{1}{2}\frac{(x_{i}-\mu_{ik})^2}{\sigma_{ik}^2}}\big)$$\begin{eqnarray} \ln \frac{P(Y=1|X_1...X_d)}{P(Y=0|X_1...X_d)} &=& \ln \frac{P(Y=1)}{P(Y=0)} + \sum_i\ln \frac{P(X_i|Y=1)}{P(X_i|Y=0)} \\ &=& \ln \frac{P(Y=1)}{P(Y=0)} + \sum_i \ln \frac{\frac{1}{\sigma_{i1}}}{\frac{1}{\sigma_{i0}}} \text{exp} \big( -\frac{1}{2} \frac{(x_{i}-\mu_{i1})^2}{\sigma_{i1}^2} + \frac{1}{2} \frac{(x_{i}-\mu_{i0})^2}{\sigma_{i0}^2} \big )\\ % &=& \ln \frac{P(Y=1)}{P(Y=0)} + \sum_i \ln \frac{\sigma_{i0}}{\sigma_{i1}} -\frac{1}{2} \sum_i \big( \frac{(x_{i}-\mu_{i1})^2}{\sigma_{i1}^2} - \frac{(x_{i}-\mu_{i0})^2}{\sigma_{i0}^2} \big ) \\ &=& \ln \frac{P(Y=1)}{P(Y=0)} + \sum_i \ln \frac{\sigma_{i0}}{\sigma_{i1}} -\frac{1}{2} \sum_i \big( x_{i} ^2(\frac{1}{\sigma_{i1}^2} - \frac{1}{\sigma_{i0}^2} ) - 2 x_{i} (\frac{\mu_{i1}}{\sigma_{i1}^2} - \frac{\mu_{i0}}{\sigma_{i0}^2} ) + (\frac{\mu_{i1}^2}{\sigma_{i1}^2} - \frac{\mu_{i0}^2}{\sigma_{i0}^2} ) \big ) \end{eqnarray}What happens if we force $\sigma_{i0} =\sigma_{i1} $?
Decision rule:
\begin{eqnarray} \ln \frac{P(Y=1|X_1...X_d)}{P(Y=0|X_1...X_d)} = \ln \frac{P(Y=1)}{P(Y=0)} + \sum_i\ln \frac{P(X_i|Y=1)}{P(X_i|Y=0)} ~ ~ ~ ~ ~ > \text{ or } <0? \end{eqnarray}If $X_i$s are $\mathcal{N}(\mu_{ik},\sigma_{ik})$:
$$p(X_i = x|Y=y_k) = \frac{1}{\sqrt{2\pi\sigma^2_{ik}}} \text{exp}\big({-\frac{1}{2}\frac{(x_{i}-\mu_{ik})^2}{\sigma_{ik}^2}}\big)$$\begin{eqnarray} \ln \frac{P(Y=1|X_1...X_d)}{P(Y=0|X_1...X_d)} &=& \ln \frac{P(Y=1)}{P(Y=0)} + \sum_i\ln \frac{P(X_i|Y=1)}{P(X_i|Y=0)} \\ &=& \ln \frac{P(Y=1)}{P(Y=0)} + \sum_i \ln \frac{\frac{1}{\sigma_{i1}}}{\frac{1}{\sigma_{i0}}} \text{exp} \big( -\frac{1}{2} \frac{(x_{i}-\mu_{i1})^2}{\sigma_{i1}^2} + \frac{1}{2} \frac{(x_{i}-\mu_{i0})^2}{\sigma_{i0}^2} \big )\\ % &=& \ln \frac{P(Y=1)}{P(Y=0)} + \sum_i \ln \frac{\sigma_{i0}}{\sigma_{i1}} -\frac{1}{2} \sum_i \big( \frac{(x_{i}-\mu_{i1})^2}{\sigma_{i1}^2} - \frac{(x_{i}-\mu_{i0})^2}{\sigma_{i0}^2} \big ) \\ &=& \ln \frac{P(Y=1)}{P(Y=0)} + \sum_i \ln \frac{\sigma_{i0}}{\sigma_{i1}} -\frac{1}{2} \sum_i \big( x_{i} ^2(\frac{1}{\sigma_{i1}^2} - \frac{1}{\sigma_{i0}^2} ) - 2 x_{i} (\frac{\mu_{i1}}{\sigma_{i1}^2} - \frac{\mu_{i0}}{\sigma_{i0}^2} ) + (\frac{\mu_{i1}^2}{\sigma_{i1}^2} - \frac{\mu_{i0}^2}{\sigma_{i0}^2} ) \big ) \end{eqnarray}What happens if we force $\hat\sigma_{i0} =\hat\sigma_{i1} $?
# Same param as before
mu_1_1 = -5; sigma_1_1 = 2
mu_2_1 = 5; sigma_2_1 = 2
mu_1_0 = 5; sigma_1_0 = 2
mu_2_0 = -5; sigma_2_0 = 2
cov_positive = np.array([[sigma_1_1**2,3], [3,sigma_2_1**2]] )
cov_negative = np.array([[sigma_1_0**2,3], [3,sigma_2_0**2]] )
print(cov_positive)
# Sample data from these distributions
X_positive = multivariate_normal.rvs(mean=[mu_1_1,mu_2_1], cov=cov_positive, size = (100))
X_negative = multivariate_normal.rvs(mean=[mu_1_0,mu_2_0], cov=cov_negative, size = (100))
plt.figure(figsize=(8,8))
plt.scatter(X_positive[:, 0], X_positive[:, 1],facecolors='r', edgecolors='w')
plt.scatter(X_negative[:, 0], X_negative[:, 1],facecolors='b', edgecolors='w')
plt.axis([-10,10,-10,10],'equal')
[[4 3] [3 4]]
[-10, 10, -10, 10]
params = dict()
# Estimate - Different variance
params['mu_1_1'], params['mu_2_1'] = np.mean(X_positive,axis=0)
params['sigma_1_1'], params['sigma_2_1'] = np.std(X_positive,axis=0)
params['mu_1_0'], params['mu_2_0'] = np.mean(X_negative,axis=0)
params['sigma_1_0'], params['sigma_2_0'] = np.std(X_negative,axis=0)
plot_GNB(X_positive,X_negative,params)
# Let's set up another example in which the variances are actually different
mu_1_1 = -5; sigma_1_1 = 3
mu_2_1 = 5; sigma_2_1 = 4
mu_1_0 = 5; sigma_1_0 = 2
mu_2_0 = -5; sigma_2_0 = 2
cov_positive = np.array([[sigma_1_1**2,3], [3,sigma_2_1**2]] )
cov_negative = np.array([[sigma_1_0**2,3], [3,sigma_2_0**2]] )
print(cov_positive)
# Sample data from these distributions
X_positive = multivariate_normal.rvs(mean=[mu_1_1,mu_2_1], cov=cov_positive, size = (100))
X_negative = multivariate_normal.rvs(mean=[mu_1_0,mu_2_0], cov=cov_negative, size = (100))
plt.figure(figsize=(8,8))
plt.scatter(X_positive[:, 0], X_positive[:, 1],facecolors='r', edgecolors='w')
plt.scatter(X_negative[:, 0], X_negative[:, 1],facecolors='b', edgecolors='w')
plt.axis([-10,10,-10,10],'equal')
[[ 9 3] [ 3 16]]
[-10, 10, -10, 10]
params = dict()
# Estimate - Different variance
params['mu_1_1'], params['mu_2_1'] = np.mean(X_positive,axis=0)
params['sigma_1_1'], params['sigma_2_1'] = np.std(X_positive,axis=0)
params['mu_1_0'], params['mu_2_0'] = np.mean(X_negative,axis=0)
params['sigma_1_0'], params['sigma_2_0'] = np.std(X_negative,axis=0)
plot_GNB(X_positive,X_negative,params)
from sklearn import datasets
plt.figure(figsize=(5,5))
X, y = datasets.make_circles(n_samples=200, factor=.25,noise=.1)
# scale
X_positive = X[y==1]*8
X_negative = X[y==0]*8
plt.figure(figsize=(8,8))
plt.scatter(X_positive[:, 0], X_positive[:, 1],facecolors='r', edgecolors='w')
plt.scatter(X_negative[:, 0], X_negative[:, 1],facecolors='b', edgecolors='w')
plt.axis([-10,10,-10,10],'equal')
[-10, 10, -10, 10]
<Figure size 360x360 with 0 Axes>
params = dict()
# Artificially force same variances
params['mu_1_1'], params['mu_2_1'] = np.mean(X_positive,axis=0)
params['sigma_1_1'], params['sigma_2_1'] = np.std(np.vstack([X_positive,X_negative]),axis=0)
params['mu_1_0'], params['mu_2_0'] = np.mean(X_negative,axis=0)
params['sigma_1_0'], params['sigma_2_0'] = np.std(np.vstack([X_positive,X_negative]),axis=0)
plot_GNB(X_positive,X_negative,params)
params = dict()
# Estimate - Different variance
params['mu_1_1'], params['mu_2_1'] = np.mean(X_positive,axis=0)
params['sigma_1_1'], params['sigma_2_1'] = np.std(X_positive,axis=0)
params['mu_1_0'], params['mu_2_0'] = np.mean(X_negative,axis=0)
params['sigma_1_0'], params['sigma_2_0'] = np.std(X_negative,axis=0)
plot_GNB(X_positive,X_negative,params)
/Users/lwehbe/env/py3/lib/python3.7/site-packages/matplotlib/contour.py:1000: UserWarning: The following kwargs were not used by contour: 'linewidth' s)
Naïve Bayes classifier